home *** CD-ROM | disk | FTP | other *** search
Text File | 1991-12-26 | 70.2 KB | 1,061 lines |
- *DISCLAIM,A
- IMPORTANT:
- Always consider WATSTAT's recommendations as a STARTING POINT and NOT
- THE FINAL WORD: they are merely intended to serve as guides to further study
- and consultation. WATSTAT can only recommend what is USUALLY appropriate,
- given the specifications you provide. Other unspecified factors my over-
- ride those that WATSTAT considers. Moreover, it would be unwise to ignore
- such "non-statistical" factors as: what procedures make the most theoretical
- sense; what procedures are established and expected in your field; and what
- procedures you and your readers will be able to interpret.
- *RAND,A
- NOTE: Since you specified Random Sampling or Random Assignment, it is
- legitimate to use INFERENTIAL STATISTICS (Significance Tests & Confidence
- Limits) as well as DESCRIPTIVE STATISTICS. But when you use Inferential
- statistics, you must still report important Descriptive statistics, such as
- means & standard deviations, percentages, or correlation coefficients.
- *NONRAND,A
- NOTE: Since you have a non-random sample, NO INFERENTIAL STATISTICS
- (such as Significance Tests or Confidence limits) are appropriate. Hence,
- WATSTAT will recommend only DESCRIPTIVE STATISTICS.
- *WHAT_DES,A
- Report all Descriptive statistics needed to characterize your sample
- (e.g., demographics) and, depending upon your analytical focus, report those
- that most clearly show: 1) the magnitude of sub-sample differences; 2) the
- strength & direction of associations; or 3) the characteristics of a single
- variable's distribution, e.g., its "average," "dispersion," and "shape."
- In deciding what Descriptive statistics to report, ask yourself: "What
- information will a reader need to REPLICATE my analysis or to COMPARE my
- results to those of others?"
- *D-UNI-NOM,A
- Summarize the distribution with a percentage table and point out the
- Modal and sparse categories. Optionally, present percentages graphically
- in a bar or pie chart.
- *D-NOM-SMALL,A
- CAUTION: Due to your small sample size, each case counts for more than 1%
- and a seemingly large between-category % difference could be due to very few
- cases. Take this into account in deciding whether percentage differences
- reflect important substantive differences in the cases you're describing.
- *D-UNI-RANK,A
- If your data are inherently in the form of ranks, sample size determines
- all the key descriptive statistics and there is no need to report them. You
- should report the number of ties and the ranks on which most ties occur.
- If you have an Ordinal variable (not originally in ranks) the Median is
- the appropriate "average" and the Quartile Deviation the appropriate index
- of "dispersion." Usually, it is also appropriate to report some additional
- Percentiles to give a more complete picture of the variable's distribution,
- for example, the 25th & 75th Percentiles, or the upper and lower Deciles.
- *D-UNI-PART,A
- If your Ordinal categories allow, compute the Median and Quartile Devia-
- tion to index the "average" and "degree of dispersion," respectively. If
- data are inherently grouped and if it is inappropriate to compute the Median
- exactly, report the category it falls in and its approximate location in the
- category. Summarize the distribution with a percentage table and point out
- the Modal and sparse categories. Optionally, present percentages graphically
- in a bar or pie chart.
- *D-UNI-INT,A
- If your data are dichotomized, report the cut-point that divides the
- categories and the percentage (or proportion) of cases in each category.
- If your data are continuous or grouped into 3 or more categories, use the
- Mean and Standard Deviation to index the "average" and "dispersion" of the
- distribution. If the distribution is highly skewed or if there are some
- extreme values that could make the Mean a "misleading average," report the
- Median instead of, or in addition to, the Mean. Whether or not the data are
- skewed, it is usually wise to report some key Percentiles to provide a more
- complete picture of the distribution, for example, the 25th & 75th Percent-
- iles, or the upper and lower Deciles.
- If the data are grouped, a Percentage Table or equivalent graphic (e.g.,
- a bar chart) is usually appropriate. If you don't use a percentage table
- with grouped data, consider reporting where the Mode falls and which, if
- any, categories are exceptionally sparse.
- If the data are continuous and if it is important to describe the shape
- of the distribution, consider grouping the data and using procedures noted
- in the preceding paragraph. Alternatively, you could present the data in a
- Frequency Polygon (line chart) or in an Ogive (a line chart that shows the
- cumulative frequency distribution).
- *D-COMP1-NOM,A
- Percentage tables are usually the best for comparing Nominal distribu-
- tions across sub-samples. Use Percentage Differences to index the magnitude
- of sub-sample differences, and point out the Modal and sparse categories for
- each sub-sample. Optionally, present percentages graphically in bar charts.
- *D-COMP2-NOM,A
- Percentage tables are usually the best for comparing Nominal distribu-
- tions across sub-samples. Use Percentage Differences to index the magnitude
- of sub-sample differences, and point out the Modal and sparse categories for
- each sub-sample. Multivariate percentage tables are appropriate for showing
- differences across two or more Independent (Comparison) variables, especial-
- ly when there are important Interaction (Specification) effects. However,
- such tables are more difficult to read, so it is usually advisable to break
- them into a set of bivariate Partial Tables. Standardized Percentage Tables
- can be used to adjust for one or more Comparison variables without showing
- them directly in the tables, but standardization can only be used for Com-
- parison variables that do not Interact with others. As an alternative to
- tables, consider presenting percentages graphically in bar charts.
- *D-COMP-RANK,A
- If your Dependent variable is inherently in the form of ranks, your best
- option is probably to compare Mean Ranks across sub-samples. However, keep
- in mind that Mean Ranks are not the same as means computed on Interval data,
- so the absolute size of sub-sample differences is not meaningful: focus only
- on "greater-than" and "less-than" relationships between Mean Ranks of your
- sub-samples. Unless ties are rare, report the number of ties and the ranks
- on which most ties occur.
- If your Ordinal Dependent variable is not ranked, the Median is the
- appropriate "average" and the Quartile Deviation the appropriate index of
- "dispersion." Compare Medians across sub-samples, and search for possible
- "interaction effects" between Comparison variables. Focus on the RELATIVE
- SIZE of sub-sample Medians (i.e., "greater-than" & "less-than" relations),
- because the absolute magnitude of Ordinal-scale Medians is not meaningful.
- Usually, it is also appropriate to report some additional Percentiles (e.g.,
- the 25th & 75th Percentiles or the highest & lowest Deciles) to give a more
- complete picture of each sub-sample distribution.
- *D-COMP-PART,A
- The best way to assess differences on a "Partially Ordered" variable
- depends on whether you're able to compute sub-sample Medians.
- If your data allow you to determine Medians exactly, report the Medians
- for all sub-samples and focus on the RELATIVE SIZE of sub-sample Medians
- (i.e., "greater-than" & "less-than" relations), since the absolute magnitude
- of Ordinal-scale Medians is not meaningful. If you have two or more Compar-
- ison Variables, search for possible "interactions" between these variables.
- If the grouping of data doesn't allow you to compute Medians, you won't
- be able to compare sub-sample "averages" in a way that takes full advantage
- of the Dependent variable's Ordinal properties. The best approach in this
- case is to present the data in Percentage Tables, which assume only Nominal
- measurement. (Optionally, present percentages graphically in bar charts.)
- Use % Differences to index the magnitude of sub-sample differences and point
- out the Modal and sparse categories for each sub-sample. Since you should
- be able to specify the CATEGORIES THAT CONTAIN THE MEDIAN for the various
- sub-samples, you can also base comparisons on the APPROXIMATE location of
- Medians; since categories are ordered, you should also be able to interpret
- an approximate difference in Medians as evidence that one sub-sample has a
- higher "average" than another.
- *D-COMP1-INT,A
- With Interval Dependent Variables it is usually appropriate to base
- sub-sample comparisons on Means. Report all sub-sample Means and Standard
- Deviations.
- *D-COMP2-INT,A
- If you have two or more Comparison Variables, search for possible inter-
- actions. If you have one or more Interval-Level Independent variables that
- you wish to control ("hold constant"), Analysis of Covariance procedures can
- be used to adjust sub-sample Means for such variables.
- *D-COMP-DICH,A
- Percentage tables are usually best for comparing Dichotomous Dependent
- variables across sub-samples, but it may be appropriate to use Rates or
- Proportions rather than %'s, especially if the Dependent variable represents
- a relatively rare occurrence, such as a disease or mortality outcome. [Note
- that Rates & Proportions may be analyzed and tabulated in much the same way
- as Percentages, although they are expressed on different scales.]
- Use % Differences [or Rate or Proportion Differences] to index the magni-
- tude of sub-sample differences, and point out the Modal and sparse catego-
- ries for the various sub-samples. Multivariate tables are appropriate for
- showing differences across two or more Independent (Comparison) variables,
- especially when important Interaction (Specification) effects are present.
- However, such tables are more difficult to read, so it may be advisable to
- break them into a set of bivariate Partial Tables. "Standardized Partial
- Percentage Tables" can be used to adjust for one or more Independent vari-
- ables without showing them directly in the tables, but standardization can
- only be used for Independent variables that do not Interact with others.
- Instead of tables, consider presenting Percentages [or Rates or Proportions]
- in graphic charts.
- *D-COMP-OTHER2,A
- Except for Interval Dependent Variables, there is no procedure designed
- to handle simultaneous sub-sample comparisons for 2 or more Dependent vari-
- ables. Your only option is to run a separate analysis for each Dependent
- variable. To get recommendations appropriate for these separate analyses,
- return to WATSTAT's Choice Boxes and select an Option other than "2 or More
- Dependent Variables" in Box 4.
- *D-BIVAR-NOM/NOM,A
- If the two Nominal variables are dichotomized, use the Phi Coefficient
- as a measure of association. If either or both of your Nominal variables
- has 3 or more categories, use Cramer's V, which is the same as Phi except
- that it adjusts for the number categories.
- *D-BIVAR-NOM/RANK,A
- There is no statistic specifically designed to measure the association
- between a Nominal Dependent variable and an Ordinal Independent variable.
- Your only choice is to break the Ordinal variable into categories and treat
- it as Nominal. If you dichotomize it, select a cut-point as close to the
- Median as possible; if you break it into 3 or more categories, select cut-
- points that yield approximately equal frequencies across categories. Once
- the Ordinal variable is categorized, the appropriate statistics are those
- for two Nominal variables.
- If the two Nominal variables are dichotomized, use the Phi Coefficient
- as a measure of association. If either or both of your Nominal variables
- has 3 or more categories, use Cramer's V, which is the same as Phi except
- that it adjusts for the number categories.
- *D-BIVAR-NOM/PART,A
- There is no statistic specifically designed to measure the association
- between a Nominal Dependent variable and an Independent variable that is
- cast in the form of Ordinal categories. Your only choice is to treat the
- Ordinal variable as if it were a set of Nominal categories, and the only
- appropriate statistics are those for two Nominal variables.
- If the two Nominal variables are dichotomized, use the Phi Coefficient
- as a measure of association. If either or both of your Nominal variables
- has 3 or more categories, use Cramer's V, which is the same as Phi except
- that it adjusts for the number categories.
- *D-BIVAR-NOM/INT,A
- There is no statistic specifically designed to measure the association
- between a Nominal Dependent variable and an Interval Independent variable,
- so you have two OPTIONS: 1) break the Interval variable into categories and
- treat it as Nominal, or 2) dichotomize the Dependent variable and treat it
- as Interval.
- If you choose OPTION 1, break the Independent variable into categories
- that contain approximately equal numbers of cases. Once this is done, the
- appropriate statistics are those for two Nominal variables.
- If the two Nominal variables are dichotomized, use the Phi Coefficient as
- a measure of association. If either or both of your Nominal variables has
- 3 or more categories, use Cramer's V, which is the same as Phi except that
- it adjusts for the number categories.
- If you choose OPTION 2, dichotomize the Dependent variable as close as
- possible to the Median unless there is theoretical justification for using
- another "high vs. low" cut-point. The dichotomized Dependent variable may
- now be assigned arbitrary scores of 0 for "low" and 1 for "high" and may,
- within limits, be treated as an Interval scale. Once this is done, you can
- use the Linear Correlation Coefficient (Pearson's r and r-squared) to index
- the strength and direction of the relationship. But if your problem calls
- for regression statistics, Linear Regression may not be appropriate: with a
- dichotomous Dependent variable some predicted (Y') scores may have impossi-
- ble values (less than 0 or greater than 1). If these impossible values are
- numerous or if they will cause problems in interpreting your results, use
- Logistic Regression instead.
- *D-BIVAR-RANK/NOM,A
- There is no statistic specifically designed to measure the association
- between an Ordinal Dependent variable and a Nominal Independent variable.
- Your only choice is to break the Ordinal variable into categories and treat
- it as Nominal. If you dichotomize it, select a cut-point as close to the
- Median as possible; if you break it into 3 or more categories, select cut-
- points that yield approximately equal frequencies across categories. Once
- the Ordinal variable is categorized, the appropriate statistics are those
- for two Nominal variables.
- If the two Nominal variables are dichotomized, use the Phi Coefficient as
- a measure of association. If either or both of your Nominal variables has
- 3 or more categories, use Cramer's V, which is the same as Phi except that
- it adjusts for the number categories.
- *D-BIVAR-RANK/RANK,A
- If both variables are in the form of ranks, you can proceed to compute one
- of the measures of association noted below. Otherwise, you must transform
- them to ranks before proceeding.
- Spearman's Rho is the best known measure of association for two Ordinal
- variables and, because it is simply the Linear Correlation Coefficient
- (Pearson's r) applied to ranks, it is often interpreted as an approximate
- index of linear correlation. The "correction for ties" should be applied
- to Rho, but it has little effect if fewer than 30% of the cases are tied.
- In some fields the preferred statistic is Kendall's Tau, which, unlike
- Spearman's Rho, does not involve any arithmetical operations that assume
- an underlying Interval Scale. This statistic is sometimes referred to as
- "Tau-A" to distinguish it from modified forms ("Tau-B" and "Tau-C) that are
- applied to "ordered contingency tables." The computing formulas for Tau-A
- found in most texts incorporate a correction for tied ranks.
- *D-BIVAR-RANK/PART,A
- There is no statistic specifically designed to measure the association
- between a "true" Ordinal Dependent variable and a "partially ordered" ind-
- ependent variable. Your best choice is to break the Dependent variable into
- ordered categories and treat both variables as "partially ordered." Prior
- to computations, copy the data into a contingency table in which rows are
- categories of the Dependent variable and columns are categories of the
- Independent variable. Use one of the following measures of association:
- The best statistic for most ordered contingency tables is a modified form
- of Kendall's Tau: use Tau-B if the number of rows in the table equals the
- number of columns; use Tau-C if the table is not "square."
- *D-BIVAR-RANK/INT,A
- There is no statistic specifically designed to measure the association
- between an Ordinal Dependent variable and an Interval Independent variable.
- If you can't assume that the Dependent variable is Interval, you'll have to
- "downgrade" the Independent variable and treat it as an Ordinal scale. If
- you can transform it to ranks, do so, and apply one of the measures of
- association recommended below. [If it is so grouped that it can only be
- transformed into a set of ordered categories, go back thru WATSTAT's Choice
- Boxes and pick Option 3, "Ordered Categories," as the Level of Measurement
- for the Independent variable.]
- Spearman's Rho is the best known measure of association for two Ordinal
- variables and, because it is simply the Linear Correlation Coefficient
- (Pearson's r) applied to ranks, it is often interpreted as an approximate
- index of linear correlation. The "correction for ties" should be applied to
- Rho, but it has little effect if fewer than 30% of the cases are tied.
- In some fields the preferred statistic is Kendall's Tau, which, unlike
- Spearman's Rho, does not involve any arithmetical operations that assume
- an underlying Interval Scale. This statistic is sometimes referred to as
- "Tau-A" to distinguish it from modified forms ("Tau-B" and "Tau-C) that are
- applied to "ordered contingency tables." The computing formulas for Tau-A
- found in most texts incorporate a correction for tied ranks.
- *D-BIVAR-PART/NOM,A
- There is no statistic specifically designed to measure the association
- between a set of ordered categories and a Nominal Independent variable, and
- your only option is to "downgrade" the Dependent variable to the Nominal
- level. For two Nominal variables the following recommendations apply.
- If the two Nominal variables are dichotomized, use the Phi Coefficient as
- a measure of association. If either or both of your Nominal variables has
- 3 or more categories, use Cramer's V, which is the same as Phi except that
- it adjusts for the number categories.
- *D-BIVAR-PART/RANK,A
- There is no statistic specifically designed to measure the association
- between a "partially ordered" Dependent variable and a "true" Ordinal ind-
- ependent variable. Your best choice is to break the Independent variable
- into ordered categories and treat both variables as "partially ordered."
- Prior to computations, copy the data into a contingency table in which rows
- are categories of the Dependent variable and columns are categories of the
- Independent variable. Use one of the following measures of association:
- The best statistic for most ordered contingency tables is a modified form
- of Kendall's Tau: use Tau-B if the number of rows in the table equals the
- number of columns; use Tau-C if the table is not "square."
- *D-BIVAR-PART/PART,A
- Prior to computations, copy the data into a contingency table in which
- rows are categories of the Dependent variable and columns are categories of
- the Independent variable. Use one of the following measures of association:
- The best statistic for most ordered contingency tables is a modified form
- of Kendall's Tau: use Tau-B if the number of rows in the table equals the
- number of columns; use Tau-C if the table is not "square."
- *D-BIVAR-PART/INT,A
- There is no statistic specifically designed to measure the association
- between a "partially ordered" Dependent variable and an Interval Independent
- variable. The best alternative is to break the Independent variable into
- ordered categories and treat both variables as "partially ordered." Prior
- to your computations, copy the data into a contingency table in which rows
- are categories of the Dependent variable and columns are categories of the
- Independent variable. Then use one of the following indices of association:
- The best statistic for most ordered contingency tables is a modified form
- of Kendall's Tau: use Tau-B if the number of rows in the table equals the
- number of columns; use Tau-C if the table is not "square."
- *D-BIVAR-INT/NOM,A
- The preferred measure of association for an Interval Dependent variable
- and a Nominal Independent variable is the Correlation Ratio (Eta). The Eta
- statistic indexes the strength of a relationship of any form, including
- non-monotonic (e.g., U-shaped). Eta-Squared is commonly reported instead of
- Eta, since it has a more meaningful interpretation: it measures the propor-
- tion of variance in the Dependent variable explained by the categories of
- the Independent variable.
- *D-BIVAR-INT/RANK,A
- There is no statistic specifically designed to measure the association
- between an Interval Dependent variable and an Ordinal Independent variable.
- If you can't assume that Independent variable is Interval, you'll have to
- "downgrade" the Dependent variable and treat it as an Ordinal scale. If
- you can transform it to ranks, do so, and apply one of the measures of
- association recommended below. [If it is so grouped that it can only be
- transformed into a set of ordered categories, go back thru WATSTAT's Choice
- Boxes and pick Option 3, "Ordered Categories," as the Level of Measurement
- for the Dependent variable.]
- Spearman's Rho is the best known measure of association for two Ordinal
- variables and, because it is simply the Linear Correlation Coefficient
- (Pearson's r) applied to ranks, it is often interpreted as an approximate
- index of linear correlation. The "correction for ties" should be applied
- to Rho, but it has little effect if fewer than 30% of the cases are tied.
- In some fields the preferred statistic is Kendall's Tau, which, unlike
- Spearman's Rho, does not involve any arithmetical operations that assume
- an underlying Interval Scale. This statistic is sometimes referred to as
- "Tau-A" to distinguish it from modified forms ("Tau-B" and "Tau-C) that are
- applied to "ordered contingency tables." The computing formulas for Tau-A
- found in most texts incorporate a correction for tied ranks.
- *D-BIVAR-INT/PART,A
- There is no statistic specifically designed to measure the association
- between an Interval Dependent variable and a "partially ordered" Independent
- variable, so you have 2 OPTIONS: 1) "downgrade" the Dependent variable by
- breaking it into ordered categories, or 2) "downgrade" the Independent vari-
- able to a Nominal scale. OPTION 2 is the best choice if you're interested
- mainly in the strength of the relationship, but since the Independent vari-
- able is assumed to be merely Nominal, you won't be unable to determine the
- direction (+/-) of the relationship.
- If you choose OPTION 1, you should break the Dependent variable into cat-
- egories that contain approximately equal numbers of cases. Copy the data
- into a contingency table in which rows are categories of the Dependent vari-
- able and columns are categories of the Independent variable. Then compute
- one of the following indices recommended for ordered contingency tables.
- The best statistic for most ordered contingency tables is a modified form
- of Kendall's Tau: use Tau-B if the number of rows in the table equals the
- number of columns; use Tau-C if the table is not "square."
- If you choose OPTION 2, every category of the Independent variable MUST
- contain at least 2 cases (preferably more), so you might have to collapse
- some sparse categories. However, categories should not be collapsed without
- restraint: it is also desirable to have as many categories as possible.
- The preferred measure of association for an Interval Dependent variable
- and a Nominal Independent variable is the Correlation Ratio (Eta). The Eta
- statistic indexes the strength of a relationship of any form, including
- non-monotonic (e.g., U-shaped). The square of the Eta (Eta-Squared) is
- commonly reported instead of Eta, since it has a more meaningful interpret-
- ation: it measures the proportion of variance in the Dependent variable
- explained by the categories of the Independent variable.
- *D-BIVAR-INT/INT,A
- In most situations the preferred index of association for two Interval
- variables is the Linear Correlation Coefficient, also called Pearson's r.
- The square of the r statistic, known as the Coefficient of Determination, is
- often reported along with r, because it measures the proportion of variance
- in one variable explained by the other.
- If you're interested in predicting or estimating scores on the Dependent
- variable from those on the Independent variable, you should compute the
- Linear Regression statistics: the Regression Coefficient, the Y-Intercept,
- and the Standard Error of Estimate.
- If you suspect that the relationship departs markedly from linearity, so
- that Pearson's r underestimates its "true" strength, you can use the Correl-
- ation Ratio (Eta) instead. This will require breaking the Independent vari-
- able into a set of categories, preferably in such a way that 5 or more cases
- fall in each category. Eta indexes the strength of a relationship of any
- form, including those which are non-monotonic (e.g., U-shaped). Eta-squared
- is commonly reported instead of Eta, because it has a more meaningful inter-
- pretation: it measures the proportion of variance in the Dependent variable
- explained by the categories of the Independent variable.
- *D-BIVAR-DICH/NOM,A
- Even if your dichotomous Dependent variable is Ordinal or Interval, it is
- probably best to treat it as Nominal, like your Independent variable, and
- use a measure of association for two Nominal variables.
- If the two Nominal variables are dichotomized, use the Phi Coefficient as
- a measure of association. If either or both of your Nominal variables has
- 3 or more categories, use Cramer's V, which is the same as Phi except that
- it adjusts for the number categories.
- *D-BIVAR-DICH/RANK,A
- There is no statistic specifically designed to measure the association
- between a dichotomous Dependent variable and an Ordinal Independent vari-
- able. You'll first have to break the Independent variable into categories
- and then you'll have 2 OPTIONS: 1) assume the Dependent variable is Ordinal
- and use a measure of association for two "partially ordered" variables, or
- 2) assume that both variables are merely Nominal and use a measure for two
- Nominal variables. Option 1 is usually preferable, but choose Option 2 if
- it makes no sense to treat the dichotomous Dependent variable as Ordinal.
- If you choose Option 1, copy the data into an ordered contingency table
- and compute one of the following:
- The best statistic for most ordered contingency tables is a modified form
- of Kendall's Tau: use Tau-B if the number of rows in the table equals the
- number of columns; use Tau-C if the table is not "square."
- If you choose Option 2, copy the data into a contingency table, making no
- assumption about the order of rows & columns. Then use one of the following
- measures appropriate for two Nominal scales:
- If the two Nominal variables are dichotomized, use the Phi Coefficient as
- a measure of association. If either or both of your Nominal variables has
- 3 or more categories, use Cramer's V, which is the same as Phi except that
- it adjusts for the number categories.
- *D-BIVAR-DICH/PART,A
- With a dichotomous Dependent variable and a "partially ordered" independ-
- ent variable, you have 2 OPTIONS: 1) assume the Dependent variable is also
- Ordinal and use a measure of association for two "partially ordered" vari-
- ables, or 2) assume the Independent variable is only Nominal and use a meas-
- ure of association for two Nominal variables. Option 1 is usually better.
- If you choose Option 1, copy the data into an ordered contingency table
- and compute one of the following:
- The best statistic for most ordered contingency tables is a modified form
- of Kendall's Tau: use Tau-B if the number of rows in the table equals the
- number of columns; use Tau-C if the table is not "square."
- If you choose Option 2, copy the data into a contingency table, making no
- assumption about the order of rows & columns. Then use one of the following
- measures appropriate for two Nominal scales:
- If the two Nominal variables are dichotomized, use the Phi Coefficient as
- a measure of association. If either or both of your Nominal variables has
- 3 or more categories, use Cramer's V, which is the same as Phi except that
- it adjusts for the number categories.
- *D-BIVAR-DICH/INT,A
- With a dichotomous Dependent variable and an Interval Independent vari-
- able, you have 2 OPTIONS: 1) assume that the dichotomy is an Interval vari-
- able, or 2) "downgrade" the Independent variable to the Nominal level. For
- Option 1, which is usually preferable, you'd use a measure of association
- for two Interval variables. For Option 2, you'd first break the Independent
- variable into categories and use a measure of association for two Nominal
- variables.
- If you choose OPTION 1, assign arbitrary scores of 0 (low) and 1 (high)
- to categories of the Dependent variable. Then use the Linear Correlation
- Coefficient (Pearson's r and r-squared) to measure the strength and direc-
- tion (+/-) of the relationship. If you're mainly interested in predicting
- Dependent variable scores from those on the Independent variable, compute
- regression statistics (Regression Coefficient, Y-Intercept, & Standard Error
- of Estimate). But note that Linear Regression may not be appropriate: with
- a dichotomous Dependent variable, some scores predicted from the regression
- equation (Y'= A+bx) may have impossible values (i.e., less than 0 or greater
- than 1). If there are many impossible values or if they will cause problems
- in interpreting your results, use Logistic Regression instead.
- If you take OPTION 2, divide the Independent variable into categories
- that contain about the same number of cases and use one of the following:
- If the two Nominal variables are dichotomized, use the Phi Coefficient as
- a measure of association. If either or both of your Nominal variables has
- 3 or more categories, use Cramer's V, which is the same as Phi except that
- it adjusts for the number categories.
- *D-MUL-SMALL-INT,A
- WARNING: The SAMPLE SIZE you specified may be TOO SMALL to support the type
- of multivariate procedure(s) WATSTAT recommended. As a practical rule of
- thumb you should have a minimum of about 10 cases for each variable in such
- procedures. To meet this criterion you may have to drop some variables from
- the analysis. If you can't drop enough to approach the 10-case-per-variable
- criterion, you shouldn't use the above procedure(s).
- *D-MUL-SMALL-NOM,A
- WARNING: The SAMPLE SIZE you specified may be TOO SMALL to use Multivariate
- Procedures for Nominal Variables, of the sort recommended. Computations for
- such methods are based on cross-tabulations, and as the number of variables
- (& categories) increases, cell frequencies can become too sparse to support
- the analysis. You may need to drop some variables from the analysis and/or
- collapse variables into fewer categories.
- *D-MUL-1DEP-NOM/NOM,A
- The recommended procedure (and the only one available) for measuring the
- association between a Nominal-level Dependent and a set of Nominal independ-
- ent variables is Log-Linear Analysis. In most cases, this procedure will
- require the use of a computer and many popular statistical software packages
- can run it. A good deal of statistical sophistication is required to apply
- it and to interpret its results. Log-Linear Analysis may not be widely used
- in your field and, if not, the task of reporting your results will be some-
- what more difficult. The use of Log-linear Analysis is also limited by the
- substantial sample size it usually requires.
- However, no alternative procedure is applicable unless you're willing to
- dichotomize the Dependent variable (so it can be scored 0/1 and treated as
- Interval) and to transform all the Independent variables and also treat them
- as Interval. The latter step would involve either: 1) dichotomizing each
- Independent variable and assigning "0" & "1" scores to its categories; or
- 2) creating a set of "dummy variables" (each scored 0/1) to represent its
- categories. After these transformations, you can apply either Logistic
- Regression or Discriminant Analysis. For more info about these procedures,
- return to WATSTAT's Choice Boxes and specify "Dichotomous" for the depen-
- dent (Box 5) variable & "Interval" for the Independent (Box 6) variables.
- *D-MUL-1DEP-NOM/INT,A
- The only procedure designed to assess the association between a Nominal
- Dependent & a set of Interval Independent variables is Discriminant Analysis.
- This procedure does not produce a single index (analogous to a correlation
- coefficient), but instead yields a set of prediction equations, called
- "Discriminant Functions," the interpretation of which requires a good deal
- of statistical expertise. Computations must be done by computer and most
- statistical software packages include Discriminant Analysis routines.
- Interpretation of results is considerably simpler if the Dependent vari-
- able is dichotomized, but if this is done, Logistic Regression and Multiple
- Correlation/Regression would also be applicable and perhaps preferable.
- *D-MUL-1DEP-NOM/MIXIO,A
- There is no procedure available to measure association between a Nominal
- Dependent variable and Independent variables with "mixed" levels of measure-
- ment, so you'll need to transform one or more Independent variables to make
- them all either Nominal or Interval. In the former case, you'd simply break
- your Interval or Ordinal variables into categories and proceed as if they
- were Nominal. In the latter, you'd transform each Ordinal or Nominal inde-
- pendent variable to Interval by either: 1) dichotomizing it and assigning
- scores of "0" and "1" to its categories; or 2) breaking it into categories
- and creating a set of "dummy variables" (each scored 0/1) to represent its
- categories.
- If all Independent variables are Nominal, Log-Linear Analysis may be
- used. For more info about Log-Linear Analysis, return to WATSTAT's Choice
- Boxes and specify "Nominal" measurement for both the Dependent (Box 5) and
- the Independent (Box 6) variables.
- If all Independent variables are Interval (including dichotomies and
- dummy variables), you can use Discriminant Analysis. For more info about
- Discriminant Analysis, return to WATSTAT's Choice Boxes and specify
- "Nominal" for the Dependent (Box 5) and "Interval" for the Independent
- (Box 6) variables.
- *D-MUL-1DEP-NOM/ORD,A
- There is no procedure available to measure association between a Nominal
- Dependent variable and Ordinal Independent variables. Your best alternative
- is to categorize the Ordinal variables and treat them as Nominal; then you
- can use Log-Linear Analysis. For more information on Log-Linear Analysis,
- return to WATSTAT's Choice Boxes and specify "Nominal" measurement for both
- the Dependent (Box 5) and the Independent (Box 6) variables.
- *D-MUL-1DEP-ORD/ALL,A
- There is no multivariate procedure designed to measure the association
- between an Ordinal Dependent variable and a set of 2 or more Independent
- variables. However, if you transform the Dependent variable (and perhaps
- the Independent variables) a number of alternatives may be applicable.
- You have 2 basic OPTIONS: 1) dichotomize the Dependent variable and treat
- it as Interval, or 2) break the Dependent variable into 2 or more categories
- and treat it as Nominal. OPTION 1 is preferable as long as it makes sense
- to dichotomize the Dependent variable.
- If you take OPTION 1, you can use either Multiple Regression/Correlation
- or Logistic Regression, BUT to do so all your Independent variables must
- also be Interval or Dichotomies (i.e., Nominal and Ordinal Independent vari-
- ables must be dichotomized or represented as sets of "dummy variables").
- For more info about Multiple Regression/Correlation, return to WATSTAT's
- Choice Boxes and choose "Interval" measurement for both the Dependent vari-
- able (Box 5) and the Independent (Box 6) variable. For more information on
- Logistic Regression, specify "Dichotomy" (Box 5) and "Interval" (Box 6).
- With OPTION 2, you can use either Discriminant Analysis or Log-Linear
- Analysis. To use Discriminant Analysis, all Independent variables must be
- Interval (i.e., Nominal & Ordinal Independent variables must be dichotomized
- or represented as sets of "dummy variables"). With Log-Linear Analysis, all
- Independent variables must be Nominal (i.e., Ordinal & Interval variables
- must be represented as sets of 2 or more Nominal categories). For more info
- about Discriminant Analysis, return to WATSTAT's Choice Boxes and specify
- "Nominal" for the Dependent (Box 5) and "Interval" for the Independent
- variables. For more info about Log-Linear Analysis, specify "Nominal" for
- both Dependent (Box 5) and Independent (Box 6) variables.
- *D-MUL-1DEP-INT/INT,A
- If your Dependent variable is Interval and all your Independent variables
- are also Interval (or dichotomies) your best choice is Multiple Regression/
- Correlation. Use the Multiple Correlation statistics (R and R-Squared) to
- index the strength of the relation between the Dependent variable and all
- the Independent variables jointly. Use the Regression Coefficients (b)
- to index the effect of each Independent variable and use the Standard Error
- of Estimate to index the precision with which the set of Independent vari-
- ables predict (estimate) scores on the Dependent variable.
- *D-MUL-1DEP-INT/OTHER,A
- There is no multivariate procedure designed to relate an Interval depend-
- ent variable with Nominal or Ordinal Independent variables. However, after
- some simple transformations, you can treat Nominal and Ordinal variables as
- if they were Interval and use Multiple Correlation/Regression procedures.
- Dichotomous Independent variables (scored 1/0) can be treated as Interval
- in these procedures and you can dichotomize whenever it makes sense to treat
- a Nominal variable as "present" vs. "absent" (1 vs. 0) or an Ordinal vari-
- able as "high" vs. "low" (1 vs. 0). However, it is often desirable to pre-
- serve a more detailed representation of Nominal & Ordinal variables: this
- can be done by dividing them into categories and using a SET of dichotomous
- variables, called "dummy variables," to represent the categories.
- Use the Multiple Correlation statistics (R and R-Squared) to index the
- strength of the relation between the Dependent variable and all the indepen-
- dent variables operating jointly. Use the Regression Coefficients (b-values)
- to index the effect of each Independent variable and use the Standard Error
- of Estimate to index the precision with which the set of Independent vari-
- ables predicts (estimates) scores on the Dependent variable.
- *D-MUL-1DEP-DICH/NOM,A
- Log-Linear Analysis is specifically designed to assess association
- between a Nominal Dependent variable and a set of Nominal Independent vari-
- ables. The fact that your Dependent variable is dichotomous presents no
- problems, as long as it makes sense to treat it as a Nominal variable.
- *D-MUL-1DEP-DICH/ORD,A
- There is no procedure designed to measure association between a dichoto-
- mous Dependent variable and Ordinal Independent variables. Your best alter-
- native is to categorize the Ordinal variables and treat them as Nominal;
- then you can use Log-Linear Analysis. For more information about Log-Linear
- Analysis, return to WATSTAT's Choice Boxes and specify "Nominal" measurement
- for both Dependent (Box 5) and Independent (Box 6) variables.
- *D-MUL-1DEP-DICH/INT,A
- Several multivariate procedures are potentially applicable if the depen-
- dent variable is a dichotomy and all the Independent variables are Interval.
- In order of preference, the available options include: Logistic Regression,
- Discriminant Analysis, & Multiple Correlation/Regression. Logistic Regress-
- ion is almost certain to be applicable. Discriminant Analysis is a good
- alternative when category frequencies on the Dependent variable approach a
- 50%/50% split, but should not be used when the split is more extreme than
- 80%/20%. Multiple Correlation/Regression is less generally applicable when
- the Dependent variable is a dichotomy: although the Dependent variable is
- scored 0 and 1 (for "low" & "high") some predicted (Y') scores may attain
- impossible values (less than 0 or greater than 1). If there are many impos-
- sible values, or if such values will cause problems in interpreting your
- results, Multiple (Linear) Correlation/Regression should NOT be used.
- *D-MUL-1DEP-DICH/MIXON,A
- There is no procedure designed to measure association between a dichoto-
- mous Dependent variable and "mixed" Ordinal/Nominal Independent variables.
- Your best alternative is to categorize the Ordinal variables and treat them
- as Nominal; then you can use Log-Linear Analysis, which assumes that all the
- Independent variables are Nominal. For more info about Log-Linear Analysis,
- return to WATSTAT's Choice Boxes and specify "Nominal" measurement for both
- Dependent (Box 5) and Independent (Box 6) variables.
- *D-MUL-1DEP-DICH/MIXIO,A
- There is no procedure designed to measure association between a dichoto-
- mous Dependent variable and Independent variables with "mixed" measurement
- levels, so you'll need to transform one or more Independent variables to
- make them ALL either Nominal or Interval. In the former case, you'd simply
- break any Interval or Ordinal variables into categories and proceed as if
- they were Nominal. In the latter, you'd transform each Ordinal or Nominal
- Independent variable to Interval by either: 1) dichotomizing it and assign-
- ing scores of "0" and "1" to its categories; or 2) breaking it into catego-
- ries and creating a set of "dummy variables" (each scored 0/1) to represent
- the categories.
- If all Independent variables can be treated as Nominal, you can use
- Log-Linear Analysis. For more info about Log-Linear Analysis, return to
- WATSTAT's Choice Boxes and specify "Nominal" measurement for both Dependent
- (Box 5) and Independent (Box 6) variables.
- If all Independent variables are Interval (including dichotomies and
- dummy variables), you can use Logistic Regression or Discriminant Analysis.
- For more info about these procedures, return to WATSTAT's Choice Boxes and
- specify "Dichotomy" for the Dependent (Box 5) variable and "Interval" for
- the Independent (Box 6) variables.
- *D-MUL-2DEP-INT/INT,A
- Several multivariate procedures are potentially applicable when all your
- variables are Interval and you're dealing with 2 or more Dependent variables
- simultaneously. They include: Canonical Correlation; measures of association
- derived from MANOVA; and various Structural Equation Modelling procedures,
- e.g., LISREL and EQS. All these assume advanced statistical training and
- must be performed by computer. Moreover, so much additional information is
- needed to choose from these alternatives that WATSTAT cannot recommend a
- "best" procedure here.
- *D-MUL-2DEP-INT/NOTINT,A
- Several multivariate procedures are potentially applicable when you're
- dealing with 2 or more Dependent variables simultaneously. They include:
- Canonical Correlation, measures of association derived from MANOVA, and
- various procedures for Structural Equation Modelling (e.g., LISREL and EQS).
- However, all require advanced statistical training and must be performed by
- computer. Further, all assume Interval measurement for ALL variables, so
- you won't be able to use them unless you drop "lower-level" variables or
- transform them to sets of dummy variables. Finally, so much additional
- information is needed to choose from these alternatives that WATSTAT can't
- recommend a "best" procedure here.
- *D-MUL-2DEP-NOTINT,A
- Several multivariate procedures are potentially applicable when you're
- dealing with 2 or more Dependent variables simultaneously. They include:
- Canonical Correlation, measures of association derived from MANOVA, and
- various procedures for Structural Equation Modelling (e.g., LISREL and EQS).
- However, all require advanced statistical training and must be performed by
- computer. Further, all assume Interval measurement for ALL variables in the
- analysis, so you probably won't be able to use them. Finally, so much addi-
- tional information is needed to choose from these alternatives that WATSTAT
- can't recommend a "best" procedure here.
- *D-MUL-NODEP-INT,A
- Factor Analysis is recommended for assessing relationships among several
- Interval-level variables when there is no Dependent variable identified.
- [Dichotomous variables, scored 0/1, may also be Factor Analyzed.]
- There are many types of Factor Analysis and selecting the appropriate
- type is too complicated for WATSTAT to handle: you'll need to consult a
- specialized text on Factor Analysis. Computations require a computer, and
- most popular statistical packages offer a variety of Factor Analysis proce-
- dures. [The manuals for some of these packages are good sources of advice
- on which type of Factor Analysis to apply.]
- *D-MUL-NODEP-RANK,A
- Kendall's Coefficient of Concordance (Kendall's W) is designed to assess
- relationships among 3 or more Ordinal variables when there is no Dependent
- variable identified. All variables must be transformed to RANKS if they are
- not inherently in rank form. The interpretation of Kendall's W is facili-
- tated by its linear relationship to "Average Rho," i.e., the mean rank-order
- correlation (Spearman' Rho) between all possible pairs of variables.
- *D-MUL-NODEP-NOTINT,A
- Factor Analysis is the only widely-used procedure designed to assess
- relationships among several variables when there is no Dependent variable
- identified. Unfortunately, this procedure assumes that all variables are
- Interval, so you can't use it for your "lower level" variables. However,
- dichotomies (scored 0/1) may be treated as Interval here, so if you can
- dichotomize your "lower level" variables, you can apply Factor Analysis.
- *S-UNI-NOM,A
- Assuming only Nominal Measurement, the Chi-Square Goodness-of-Fit Test
- may be used to test whether it's likely that your RANDOM SAMPLE came from a
- POPULATION with an hypothesized proportion of cases in its various catego-
- ries. You specify the Population proportions (P) in the Null Hypothesis and
- multiply each P by Sample Size to obtain EXPECTED FREQUENCIES for the test.
- Within limits, you may specify any set of P's derived from theory or prior
- knowledge of a relevant population.
- If your variable is Dichotomous, the Binomial Test is preferable to the
- Chi-Square Goodness-of-Fit, especially when sample size is small. Use Exact
- Binomial Tables for small sample sizes and the Normal Approximation (z-Test)
- for larger (>25) samples.
- *S-UNI-RANK,A
- In the special situation where "scores" or Ranks represent a SEQUENCE of
- cases, the so-called "Test for Runs Up and Down" can be used to test for a
- TREND, i.e., a tendency for scores to increase or decrease over a sequence.
- If data are NOT SEQUENCED and NOT RANKED, your best alternative is to
- categorize the data and to apply a test designed for "Partially Ordered"
- data (One-Sample Kolmogorov-Smirnov Test) or Nominal data (Chi-Square
- Goodness-of-Fit Test). There is no Univariate test for UNSEQUENCED RANKS.
- *S-UNI-PART,A
- The Kolmogorov-Smirnov One-Sample Test is recommended for a Categorized
- Ordinal ("Partially Ordered") variable. It tests the Null Hypothesis that
- the random sample was drawn from a Population with some specified Proportion
- of cases in the various categories: you specify these Proportions based on
- theory or prior information about the Population.
- *S-UNI-INT,A
- Use the One-Sample t-Test to determine whether it is likely that your
- sample was DRAWN FROM A POPULATION WITH A KNOWN (or guessed) MEAN, which
- you specify in the Null Hypothesis. Besides requiring INTERVAL MEASUREMENT,
- valid application of this test assumes the sample was drawn from a NORMALLY
- DISTRIBUTED POPULATION. Check to see that your data adequately meet these
- assumptions: most intro. texts explain conditions under which they may be
- relaxed.
- If you're interested in estimating the MEAN of the POPULATION from which
- your RANDOM SAMPLE was drawn, compute CONFIDENCE LIMITS FOR THE MEAN.
- If you're interested in the SHAPE of your variable's distribution, use
- the Chi-Square Goodness-of-Fit Test to see if it's likely that your SAMPLE
- was drawn from a POPULATION with an hypothesized proportion of cases in its
- various categories. You specify the Population Proportions (P) in the NULL
- Hypothesis and multiply each P by Sample N to get EXPECTED FREQUENCIES for
- the test. Within limits, you may hypothesize any set of P's derived from
- theory or prior knowledge of a population. If you get the P's from a table
- of the Normal Distribution, you can use the Chi-Square Goodness-of-Fit Test
- to see whether it's likely that your sample came from a NORMALLY DISTRIBUTED
- POPULATION.
- *S-2SAMPLE-INT,A
- Use Student's t-Test to compare TWO SUB-SAMPLE MEANS on an INTERVAL
- DEPENDENT VARIABLE, where RANDOM SAMPLING or RANDOM ASSIGNMENT of cases has
- yielded INDEPENDENT SUB-SAMPLES. Valid application of this test assumes:
- 1) that sub-samples were drawn from two NORMALLY DISTRIBUTED POPULATIONS, &
- 2) that the two parent POPULATIONS have EQUAL VARIANCES. Check to see that
- your data approximate these assumptions: most intro. texts list conditions
- under which these assumptions may be relaxed. A special form of the t-test
- is available in cases where population variances are unequal.
- *S-2MATCH-INT,A
- Use the Matched-Pairs t-Test to compare TWO SUB-SAMPLE MEANS on an
- INTERVAL DEPENDENT VARIABLE, where RANDOM SAMPLING or RANDOM ASSIGNMENT has
- yielded MATCHED (dependent) SUB-SAMPLES. Valid application of this test
- assumes that sub-samples were drawn from 2 NORMALLY DISTRIBUTED POPULATIONS.
- Check to see that your data approximate this assumption: most intro. texts
- list conditions under which it may be relaxed.
- *ARCSINE,A
- A number of tests are available for comparing 2 dichotomous sub-samples,
- in cases where RANDOM SAMPLING OR RANDOM ASSIGNMENT has yielded INDEPENDENT
- SUB-SAMPLES. (They are listed in order of preference.) The Arcsine Test is
- the preferred alternative, especially if sample size is small. A Chi-Square
- Contingency Test, with data cast in a 2-by-2 table, gives similar results
- when sample size is large. For smaller samples, Fisher's Exact may be used.
- Special forms of the z-test and t-test, which test for DIFFERENCES IN PRO-
- PORTIONS, are also applicable. Consult a statistics text for the assump-
- tions underlying each of these tests.
- *FISHER-EXACT,A
- Fisher's Exact Test is usually the best alternative for detecting a
- difference between INDEPENDENT SUB-SAMPLES when sample size is very small
- and data can be cast in a 2-by-2 contingency table. Fisher's Exact Test is
- also used as an alternative to the Chi-Square Contingency Test when sample
- size is too small to apply the latter: in such cases it is used to test for
- the significance of an ASSOCIATION BETWEEN 2 DICHOTOMOUS NOMINAL VARIABLES.
- Although not widely-known, Fisher's Exact Test can be extended to tables
- larger than a 2-by-2: the only problem is finding a computer program that
- calculates p-values for larger tables.
- *MCNEMAR,A
- The McNemar Test is designed to compare a DICHOTOMOUS DEPENDENT VARIABLE
- across 2 MATCHED SUB-SAMPLES. The Dependent variable may be inherently
- dichotomous or transformed to a dichotomy especially for the test. There is
- NO TEST designed to compare a Dependent variable with 3 or more categories
- across Matched Sub-Samples.
- The McNemar Test assumes only Nominal Measurement, but if an Ordinal
- Dependent variable is dichotomized at the Overall Median, it can be used as
- a test for differences between Medians for MATCHED SAMPLES.
- *MEDIAN-TEST,A
- The Median Test is designed to compare 2 INDEPENDENT SUB-SAMPLES when
- the DEPENDENT VARIABLE is ORDINAL and when it is feasible to determine the
- OVERALL MEDIAN OF THE TOTAL SAMPLE. Although tests based on ranks are
- preferable, the Median Test is a good alternative when data are "Partially
- Ordered" or when sample size so large that it is infeasible to rank the data.
- The Median Test is really a "transformation" rather than a distinct test:
- data are cast in a 2-by-2 contingency table by breaking the Dependent vari-
- able at the overall Median; then either the Chi-Square Contingency Test or
- Fisher's Exact Test is applied, depending on sample size.
- The Median Test can also be applied when there are 3 or More INDEPENDENT
- SUB-SAMPLES. In this case, the Dependent variable is again Dichotomized at
- the OVERALL MEDIAN, but data are cast in a 2-by-k contingency table, where
- k is the number of sub-samples. Then the Chi-Square Contingency Test is
- applied.
- *WILCOX-MATCH,A
- The appropriate test for a difference between TWO MATCHED SUB-SAMPLES,
- when the ORDINAL DEPENDENT VARIABLE is scored a RANKS, is the Wilcoxon
- Matched-Pairs Test [sometimes called the Matched-Pairs Signed-Ranks Test].
- *WILCOX-RSUM,A
- Two tests, the Wilcoxon Rank-Sum Test and the Mann-Whitney U-Test, can
- be applied to test for a difference between TWO INDEPENDENT SUB-SAMPLES,
- when the ORDINAL DEPENDENT VARIABLE is scored as RANKS. These are really
- two forms of the same test and yield exactly the same p-values. Although
- the Mann-Whitney is more widely used, the Wilcoxon Rank-Sum Test is much
- easier to compute and interpret and, therefore, preferable. [Don't confuse
- this Rank-Sum Test with Wilcoxon's Matched-Pairs Test, which is used for
- DEPENDENT SUB-SAMPLES.]
- *ONEWAY,A
- The appropriate significance test for differences between Means of three
- or more INDEPENDENT SUB-SAMPLES is the so-called "ONE-WAY ANOVA F-TEST."
- This is an "overall" test: it detects differences between pairs or combina-
- tions of sub-samples, but it can't specify which sub-samples differ. Thus,
- it must be followed by more specific tests, called CONTRASTS, to pinpoint
- which sub-samples differ. Besides assuming INDEPENDENT SUB-SAMPLES and
- INTERVAL MEASUREMENT, this F-Test assumes that sub-samples were drawn from
- NORMALLY DISTRIBUTED POPULATIONS that have EQUAL VARIANCES. Check to see
- that your data approximate all these assumptions: most intro. texts specify
- conditions under which they may be relaxed. Consult a specialized text on
- Analysis of Variance (ANOVA) for help in selecting a test for CONTRASTS
- following the overall F-Test. [Usually, the Duncan Multiple-Range Test is
- best for Contrasts between PAIRS of sub-samples and the Scheffe Test best
- for Contrasts between GROUPS of sub-samples, but there are many other alter-
- natives that may be preferable in your case.]
- *TWOWAY,A
- The best significance test for differences between Means of 3 or more
- MATCHED SUB-SAMPLES is ANALYSIS OF VARIANCE F-TEST FOR RANDOMIZED BLOCKS,
- which is sometimes loosely called "TWO-WAY" ANOVA. In this design, "Blocks"
- may be individual cases or sets of matched cases, which are represented in
- all the sub-samples. Blocks are used to "control" extraneous between-case
- variation. When individual cases appear in all the sub-samples, the design
- is referred to as a RANDOMIZED BLOCKS DESIGN WITH REPEATED MEASURES.
- The F-Test is an "overall" test: it detects differences between pairs or
- combinations of sub-samples, but it can't specify which sub-samples differ.
- Thus, it must be followed by more specific tests, called CONTRASTS, to pin-
- point which sub-samples differ. Besides assuming INTERVAL MEASUREMENT, this
- F-Test assumes that sub-samples were drawn from NORMALLY DISTRIBUTED POPULA-
- TIONS that have EQUAL VARIANCES. Check to see that your data approximate
- all these assumptions. Specialized texts on Analysis of Variance (ANOVA)
- usually contain extensive explanations of underlying assumptions and also
- offer help in selecting a test for CONTRASTS following the overall F-Test.
- *CR-FACTORIAL,A
- ANALYSIS OF VARIANCE with a COMPLETELY RANDOMIZED FACTORIAL (CRF) design
- is the best alternative when you have: an 1) INTERVAL DEPENDENT VARIABLE,
- 2) TWO OR MORE COMPARISON VARIABLES, and 3) NO MATCHING of cases across
- sub-samples of any Comparison Variable. [The last condition implies that
- each case appears in the analysis one and only one time.]
- The CRF design yields an F-Test for each Comparison Variable and also
- for INTERACTION EFFECTS due to sets of these variables. The F-Tests are
- "overall" tests: they detect differences between pairs or combinations of
- sub-samples, but don't specify which sub-samples differ. Thus, they must
- be followed by more specific tests, called CONTRASTS, to pinpoint which
- sub-samples differ. Besides INTERVAL MEASUREMENT, the F-Tests assume that
- the sub-samples were drawn from NORMALLY DISTRIBUTED POPULATIONS that have
- EQUAL VARIANCES. Check to see that your data approximate all these assump-
- tions. Specialized texts on Analysis of Variance usually contain extensive
- explanations of underlying assumptions and the conditions under which they
- may be relaxed. Only a few offer help in selecting the most appropriate
- test for CONTRASTS in CRF Designs.
- *RB-FACTORIAL,A
- ANALYSIS OF VARIANCE with a RANDOMIZED BLOCKS FACTORIAL (RBF) design is
- the best alternative if you have: an 1) INTERVAL DEPENDENT VARIABLE, 2) TWO
- OR MORE COMPARISON VARIABLES, and 3) MATCHED CASES or OBSERVATIONS across
- sub-samples of one or more Comparison Variables. In this design, "Blocks"
- may be individual cases or sets of matched cases, which are represented in
- all the sub-samples of a Comparison Variable. Blocks are used to "control"
- extraneous between-case variation. When individual cases appear in all the
- sub-samples of any Comparison Variable, the design is referred to as a
- RANDOMIZED BLOCKS FACTORIAL DESIGN WITH REPEATED MEASURES. When the Blocks
- are split into "Sub-Blocks" on one or more "Blocking Variables" the design
- is referred to as a SPLIT-PLOT DESIGN.
- The RBF design yields an F-Test for each Comparison Variable and also
- for INTERACTION EFFECTS due to sets of these variables. The F-Tests are
- "overall" tests: they detect differences between pairs or combinations of
- sub-samples, but don't specify which sub-samples differ. Thus, they must
- be followed by more specific tests, called CONTRASTS, to pinpoint which of
- the sub-samples differ. Besides INTERVAL MEASUREMENT, the F-Tests assume
- that sub-samples were drawn from NORMALLY DISTRIBUTED POPULATIONS that have
- EQUAL VARIANCES. Check to see that your data approximate all these assump-
- tions. Specialized texts on Analysis of Variance usually contain extensive
- explanations of underlying assumptions and the conditions under which they
- may be relaxed. Only a few offer help in selecting the most appropriate
- test for CONTRASTS in RBF or Split-Plot Designs.
- *ANOVA/REGN,A
- [Traditional ANOVA computations for the above design require EQUAL FREQUEN-
- CIES in all the cells created when the sample is split by 2 or more Compar-
- ison Variables. If cell frequencies are unequal, F-Ratios can be obtained
- through Multiple Regression procedures, of which ANOVA is a special case.
- Most computer programs use Multiple Regression for all ANOVA problems, but
- hide this fact by reporting results in a conventional ANOVA Summary Table.]
- *ANCOVA,A
- If you have one or more Independent variables that you wish to "control"
- or "adjust for" without building them in as Comparison Variables, you can
- apply ANALYSIS OF COVARIANCE (ANCOVA) procedures. ANCOVA is an extension of
- ANOVA in which the effects of one or more INTERVAL-LEVEL INDEPENDENT VARI-
- ABLES are "partialled out," through Multiple Regression procedures, before
- F-Ratios are computed for the major Comparison Variables. Normally, vari-
- ables are selected for such adjustment because they create "extraneous"
- variation in the Dependent Variable and can't be eliminated physically.
- ANCOVA usually requires a computer and most popular statistical packages
- can perform it. To use ANCOVA, you must meet all the assumptions of ANOVA
- and Multiple Regression, plus some additional ones unique to this procedure.
- Specialized texts on Analysis of Variance usually explain all these assump-
- tions and the conditions under which they may be relaxed.
- *MANOVA,A
- MULTIVARIATE ANALYSIS OF VARIANCE (MANOVA) is an extension of ANOVA
- designed to handle two or more INTERVAL-LEVEL DEPENDENT VARIABLES simulta-
- neously. The application of MANOVA and the interpretation of its results
- requires advanced statistical training. If you lack such expertise, and if
- your theory demands MANOVA, it would be wise to seek help from a statistical
- consultant before attempting to apply it. It may be wiser yet to choose a
- procedure that can be applied in separate analyses for each Dependent vari-
- able. If the latter alternative is feasible, WATSTAT may be able to offer
- more help: return to the Choice Boxes and select "Multivariate with ONE
- Dependent Variable" in Box 4.
- *CHI-LOGIST,A
- Significance tests associated with Logistic Regression PARALLEL those
- used with Linear Multiple Regression: there are tests for overall fit of
- the equation as well as for individual Regression Coefficients. However,
- as Logistic Regression is based on a different equation-fitting criterion,
- neither the tests nor their interpretations are IDENTICAL to their Linear
- counterparts. Logistic Regression also has its own set of assumptions and
- limitations, which you'll need to consider.
- *CHI-COMP-NOM,A
- Use the Chi-Square Contingency Test to determine whether it is likely
- that your RANDOM SAMPLE was drawn from a set of Sub-Populations (correspond-
- ing to your Sub-Samples) that have the same proportion of cases in the
- various categories of the Dependent Variable. [Chi-Square must be computed
- on RAW FREQUENCIES: don't make the common beginner's error of computing it
- from a table of Percentages or Proportions.]
- *CHI-PHI,A
- The appropriate significance test for the Phi Coefficient or Cramer's V
- is the Chi-square Contingency Test. Fisher's Exact Test may be used as a
- test for Phi if sample size is too small for the Chi-Square Test.
- *TTEST-BIV-R,A
- A special t-Test or F-Test is used to test for the significance of the
- Correlation Coefficient (r) or the Regression Coefficient (b). In the bi-
- variate case, t and F Tests yield exactly the same p-values and tests for
- r and b are equivalent. Besides requiring INTERVAL MEASUREMENT, these tests
- assume BIVARIATE NORMALITY. Check to see that your data approximate this
- assumption: most intro. texts list conditions under which it may be relaxed.
- *TTEST-RHO,A
- A special t-Test is used to test for the significance of Spearman's Rho.
- The computing formula for this test is the same as that used for the Linear
- Correlation Coefficient (r) except that Rho replaces r in the computations.
- *ZTEST-TAU,A
- The significance test for Kendall's Tau uses a z-statistic, which is
- referred to a table of the Standard Normal Distribution to obtain p-values.
- For sample sizes less than 10, exact tables are available and should be used
- instead of the Normal approximation.
- *FTEST-ETA,A
- The significance test used for the Correlation Ratio (Eta) is the F-Test
- obtained from a ONE-WAY ANALYSIS OF VARIANCE.
- *FTEST-MULTR,A
- An F-Test is used to test for the significance of the Multiple Correla-
- tion Coefficient. A special t-Test or F-Test (yielding identical p-values)
- is used to test the significance of each Regression Coefficient in the equa-
- tion. F-Tests for "R-Square Change" can be used to test whether a set of
- two or more Independent Variables contributes significantly to the fit of
- equation. Valid application of these tests rests on many stringent assump-
- tions: consult a Multiple Regression/Correlation text for information about
- these assumptions and check to see that your data meet them.
- *S-LOG-LIN,A
- Several significance tests are usually applied in a Log-Linear Analysis,
- all of which are referred to the Chi-Square Distribution to obtain p-values.
- In addition to a test for overall fit of a Log-Linear Model (analogous to a
- test for R-Squared in Regression), tests are usually made for MAIN EFFECTS
- and INTERACTION EFFECTS (analogous to F-Tests in Analysis of Variance).
- *S-DISCRIM,A
- Several F-Tests are usually applied in a Discriminant Analysis, includ-
- ing: a test for fit of each discriminant function, tests for the contribu-
- tion of each Discriminant Function Coefficient, and tests for differences
- between groups. Computer programs also use significance tests as criteria
- for including variables and for terminating the analysis. [The validity of
- these criteria, like ALL significance tests, rests on the assumption of
- Random Sampling.]
- *S-FACTOR-ANAL,A
- Numerous tests can be applied in Factor Analysis, including tests for
- Factor Loadings, Correlations between Factors, and the Number of Factors.
- When the focus is on description, as it is in so-called "Exploratory Factor
- Analysis," there is usually no need for any tests. However, significance
- tests become central when the Factor Analysis is used to address theoretical
- hypotheses, as in "Confirmatory Factor Analysis."
- *S-KENDALL-W,A
- The significance test for Kendall's W uses exact tables when sample
- size and the number of variables are small. Otherwise, a Chi-Square stat-
- istic is used. The Null Hypothesis tested is that the sample was drawn
- from a population in which the variables are mutually Independent.
- *S-COCHRANQ,A
- Cochran's Q Test is designed to compare a DICHOTOMOUS DEPENDENT VARIABLE
- across 3 or more MATCHED SUB-SAMPLES. The Dependent variable may be inher-
- ently dichotomous or transformed to a dichotomy especially for the Q-test.
- There is NO TEST designed to compare a Dependent variable with 3 or more
- categories across Matched Sub-Samples.
- Cochran's Q Test assumes only Nominal Measurement, but if an Ordinal
- Dependent variable is dichotomized at the OVERALL MEDIAN, it can be used to
- test the Null Hypothesis that Matched Sub-Samples were RANDOMLY drawn from
- Populations with the same Median.
- *KRUSKAL,A
- The Kruskal-Wallis Test is designed to compare an ORDINAL DEPENDENT
- VARIABLE across 3 or more INDEPENDENT SUB-SAMPLES. If the Dependent vari-
- able is not inherently Ranked it must be transformed to Ranks for the test.
- The Kruskal-Wallis is an analogue of One-Way ANOVA and uses a Chi-Square
- test statistic in place of the ANOVA F-Test.
- *FRIEDMAN,A
- The Friedman Test is designed to compare an ORDINAL DEPENDENT VARIABLE
- across 3 or more MATCHED SUB-SAMPLES. If the Dependent variable is not
- inherently Ranked it must be transformed to Ranks for the test. This test
- is an analogue of "Two-Way ANOVA" (Randomized Blocks ANOVA) and uses a
- Chi-Square test statistic in place of the ANOVA F-Test.
- *S-COMP2-RANK,A
- There is no well-known significance test for Ordinal data that can
- handle 2 or more Independent (Comparison) Variables in a single analysis.
- That is, there are no Ordinal-Level analogues to Factorial ANOVA, Analysis
- of Covariance, etc., which are used with Interval Dependent Variables.
- *S-COMP2-DICH,A
- There is no test designed to compare a DICHOTOMOUS DEPENDENT VARIABLE
- across SUB-SAMPLES created by 2 or more Independent (Comparison) variables.
- However, if it's appropriate to shift the Analytical Focus from "Sub-Sample
- Comparison" to "Association," a number of alternatives are open. Among
- these are Logistic Regression and Discriminant Analysis. If your Analytical
- Focus can be changed in this way -- if it MAKES SENSE to cast your research
- questions in terms of Association -- return to WATSTAT's Choice Boxes and
- select "No Sub-Sample Comparisons" in Box 2 and "Describe Association" in
- Box 3. WATSTAT's Report will then give you more information about Logistic
- Regression and Discriminant Analysis.
- *S-COMP2-NOM-IND,A
- There is no test designed to compare a NOMINAL DEPENDENT VARIABLE across
- SUB-SAMPLES created by 2 or more Independent (Comparison) variables.
- If it's appropriate to change your Analytical Focus from "Sub-Sample
- Comparison" to "Association," a number of alternatives are open, namely,
- Log-Linear Analysis, Logistic Regression, and Discriminant Analysis. If it
- MAKES SENSE to re-cast your research questions in terms of Association,
- return to WATSTAT's Choice Boxes and select "No Sub-Sample Comparisons" in
- Box 2 and "Describe Association" in Box 3. WATSTAT's Report will then give
- you more information about the above alternatives. [All these alternatives
- require advanced statistical training: a wise novice will seek expert help.]
- *S-COMP2-NOM-MATCH,A
- There is NO MULTIVARIATE TEST designed to compare a NOMINAL DEPENDENT
- VARIABLE across MATCHED SUB-SAMPLES created by 2 or more Comparison vari-
- ables. If you haven't yet collected the data, consider ways to achieve an
- Interval-Level measure of the Dependent variable. If the data are already
- collected, and if it's appropriate and feasible to dichotomize the Dependent
- variable, you may be able to use ANOVA F-Tests. [This will also require a
- so-called ARCSINE TRANSFORMATION before ANOVA can be applied to a Dichotomous
- Dependent variable.] If either of these options is viable in your case,
- return to WATSTAT's Choice Boxes and select "Interval" in Box 5.
- *COPYRIGHT,A
- COPYRIGHT 1991 BY HAWKEYE SOFTWORKS, 300 GOLFVIEW AVE., IOWA CITY, IA, 52246
-
-